Java Extension Optimizations by samyron · Pull Request #835 · ruby/json

samyron · 2025-08-13T13:25:05Z

Changelog 📓

Use a segmented buffer for the OutputStream to reduce System.arraycopy's each time the output buffer is resized.
Refactored StringEncoder#encode to include a SWAR-based fast path for basic JSON encoding. The algorithm is from this post. It's the same as the vector-based algorithm in the C extension.

These features can be toggled with the system properties json.useSegmentedOutputStream and json.useSWARBasicEncoder. Both default to true. I'm happy to remove these. They made testing and benchmarking much easier.

Benchmarks

SegmentedByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.741k i/100ms
Calculating -------------------------------------
                json     18.378k (± 6.3%) i/s   (54.41 μs/i) -    182.805k in  10.011722s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    85.000 i/100ms
Calculating -------------------------------------
                json    857.615 (± 1.3%) i/s    (1.17 ms/i) -      8.585k in  10.012075s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   185.000 i/100ms
Calculating -------------------------------------
                json      1.849k (± 1.0%) i/s  (540.77 μs/i) -     18.500k in  10.005181s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.558k i/100ms
Calculating -------------------------------------
                json     25.217k (± 1.1%) i/s   (39.66 μs/i) -    253.242k in  10.043890s

ByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.560k i/100ms
Calculating -------------------------------------
                json     15.622k (± 0.8%) i/s   (64.01 μs/i) -    157.560k in  10.086737s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    87.000 i/100ms
Calculating -------------------------------------
                json    875.692 (± 0.9%) i/s    (1.14 ms/i) -      8.787k in  10.035282s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   182.000 i/100ms
Calculating -------------------------------------
                json      1.818k (± 0.8%) i/s  (550.15 μs/i) -     18.200k in  10.013389s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.544k i/100ms
Calculating -------------------------------------
                json     25.319k (± 0.9%) i/s   (39.50 μs/i) -    254.400k in  10.048804s

ByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.078k i/100ms
Calculating -------------------------------------
                json     10.829k (± 2.5%) i/s   (92.35 μs/i) -    108.878k in  10.062513s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    78.000 i/100ms
Calculating -------------------------------------
                json    810.901 (± 2.8%) i/s    (1.23 ms/i) -      8.112k in  10.013134s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   128.000 i/100ms
Calculating -------------------------------------
                json      1.269k (± 3.3%) i/s  (788.26 μs/i) -     12.672k in  10.001657s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.178k i/100ms
Calculating -------------------------------------
                json     21.633k (± 1.0%) i/s   (46.23 μs/i) -    217.800k in  10.068853s

SegmentedByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-realworld.rb

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.014k i/100ms
Calculating -------------------------------------
                json     10.203k (± 0.8%) i/s   (98.01 μs/i) -    102.414k in  10.037929s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    79.000 i/100ms
Calculating -------------------------------------
                json    814.479 (± 2.1%) i/s    (1.23 ms/i) -      8.216k in  10.092101s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   136.000 i/100ms
Calculating -------------------------------------
                json      1.358k (± 1.0%) i/s  (736.45 μs/i) -     13.600k in  10.016731s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.246k i/100ms
Calculating -------------------------------------
                json     21.987k (± 1.6%) i/s   (45.48 μs/i) -    220.108k in  10.013722s

master (as of commit `37e6890`)

% ONLY=json ruby -I"lib" benchmark/encoder-realworld.rb 

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   951.000 i/100ms
Calculating -------------------------------------
                json      9.517k (± 0.8%) i/s  (105.08 μs/i) -     96.051k in  10.093716s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    84.000 i/100ms
Calculating -------------------------------------
                json    843.486 (± 1.1%) i/s    (1.19 ms/i) -      8.484k in  10.059526s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   145.000 i/100ms
Calculating -------------------------------------
                json      1.448k (± 0.8%) i/s  (690.73 μs/i) -     14.500k in  10.016276s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.342k i/100ms
Calculating -------------------------------------
                json     23.073k (± 0.8%) i/s   (43.34 μs/i) -    231.858k in  10.049473s

samyron · 2025-08-13T19:53:02Z

+    private static final int DEFAULT_CAPACITY = 1024;
+
+    private int totalLength;
+    private byte[][] segments = new byte[21][];


Why 21? The minimum segment size is 1024 for the first segment. The code doubles the segment size for each additional segment. Based on this doubling, we only need 21 segments before we hit Integer.MAX_VALUE.

Makes sense. 👏

Maybe a comment or well-named constant so nobody else asks that question in the future?

samyron · 2025-08-14T03:23:32Z

Synthetic benchmarks of encoding an array of 128-byte ASCII strings.

benchmark_encoding "bytes.128.bestcase", ([("a" * 128)] * 10000)

SegmetedByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   256.000 i/100ms
Calculating -------------------------------------
                json      2.561k (± 0.9%) i/s  (390.48 μs/i) -     25.600k in   9.997219s

ByteListDirectOutputStream + Scalar (effectively the same code as master)

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   137.000 i/100ms
Calculating -------------------------------------
                json      1.376k (± 1.2%) i/s  (726.60 μs/i) -     13.837k in  10.055507s

SegmentedByteListDirectOutputStream + Scalar

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=false' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   141.000 i/100ms
Calculating -------------------------------------
                json      1.424k (± 0.8%) i/s  (702.28 μs/i) -     14.241k in  10.001896s

ByteListDirectOutputStream + SWAR

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=false -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   254.000 i/100ms
Calculating -------------------------------------
                json      2.558k (± 1.5%) i/s  (390.92 μs/i) -     25.654k in  10.030970s

Master

% ONLY=json ruby -I"lib" benchmark/encoder-synthetic.rb

== Encoding bytes.128.bestcase (1310001 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   134.000 i/100ms
Calculating -------------------------------------
                json      1.334k (± 3.6%) i/s  (749.69 μs/i) -     13.400k in  10.062253s

tompng · 2025-08-15T13:42:42Z

+
+        if (pos + 4 <= len) {
+            int x = bb.getInt(ptr + pos);
+            int is_ascii = 0x808080 & ~x;


This hex number only checks 3 bytes.
Maybe 0x808080 → 0x80808080

Great catch, thank you! Late night coding without my glasses...

Interestingly no spec failed. I'll try to address that.

samyron · 2025-08-18T01:49:58Z

As of commit c3d02b08b0708b9fb6eec2fcd819224706418985 I refactored the SWAR implementation into it's own subclass of StringEncoder. I did so after looking at the jitwatch suggestions which hinted that the encodeBasic and encodeBasicSWAR methods could not be inlined into the StringEncoder#encode method. That implied there was at least one conditional and branch every time StringEncoder#encode was called when the SWAR implementation was used.

Benchmarks as of this commit

SWAR + SegmentedByteListDirectOutputStream

% ONLY=json JAVA_OPTS='-Djson.useSegmentedOutputStream=true -Djson.useSWARBasicEncoder=true' ruby -I"lib" benchmark/encoder-realworld.rb
== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.821k i/100ms
Calculating -------------------------------------
                json     18.262k (± 0.8%) i/s   (54.76 μs/i) -    183.921k in  10.071834s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    90.000 i/100ms
Calculating -------------------------------------
                json    904.604 (± 1.4%) i/s    (1.11 ms/i) -      9.090k in  10.050832s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   192.000 i/100ms
Calculating -------------------------------------
                json      1.867k (± 9.8%) i/s  (535.52 μs/i) -     18.432k in  10.061992s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.488k i/100ms
Calculating -------------------------------------
                json     26.728k (± 4.9%) i/s   (37.41 μs/i) -    266.216k in  10.000183s

SWAR + ByteListDirectOutputStream

Note: This did seem like a particularly good run, at least for the activitypub.json benchmark.

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.866k i/100ms
Calculating -------------------------------------
                json     18.740k (± 0.7%) i/s   (53.36 μs/i) -    188.466k in  10.057320s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json    87.000 i/100ms
Calculating -------------------------------------
                json    875.255 (± 1.4%) i/s    (1.14 ms/i) -      8.787k in  10.041293s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json   183.000 i/100ms
Calculating -------------------------------------
                json      1.829k (± 1.3%) i/s  (546.89 μs/i) -     18.300k in  10.009902s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 24.0.1+9-30 on 24.0.1+9-30 +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.582k i/100ms
Calculating -------------------------------------
                json     25.316k (± 0.9%) i/s   (39.50 μs/i) -    255.618k in  10.097779s

samyron · 2025-08-18T01:53:22Z

I'm happy to disable the SegmentedByteListDirectOutputStream by default or remove it from this PR entirely. On my Macbook Air M1 it does seem to help a bit with some benchmarks. It also seems to be a bit more resilient between changing the order of the benchmarks. However, it doesn't seem to help as much on my Macbook Pro M4. I don't have current benchmarks to post from the M4 but will run them again as of the commit above tomorrow.

samyron · 2025-08-19T01:51:38Z

Benchmarks from an Macbook Pro M4. I ran these a bunch of times and the results do vary a bit each run but I grabbed a random sampling. The big surprise is the activitypub.json benchmark in the SegmentedByteListDirectOutputStream + Scalar results. I didn't expect it to make that big of a difference, especially considering the other benchmarks were much closer.

Note, while I don't have the benchmarks here, if I do run the citm_catalog benchmark before activitypub the SegmentedByteListDirectOutputStream does perform on both of those. The data shape on citm_catalog is quote different from the activitypub. Hotspot is probably making different decisions about what/how to optimize the code. It's possible I'm not running the benchmarks long enough for the results to stabilize.

SegmentedByteListDirectOutputStream + SWAR

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.311k i/100ms
Calculating -------------------------------------
                json     23.693k (± 1.0%) i/s   (42.21 μs/i) -    473.755k in  19.997219s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   130.000 i/100ms
Calculating -------------------------------------
                json      1.290k (± 1.2%) i/s  (775.31 μs/i) -     25.870k in  20.060511s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   256.000 i/100ms
Calculating -------------------------------------
                json      2.544k (± 1.0%) i/s  (393.06 μs/i) -     50.944k in  20.026085s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     3.477k i/100ms
Calculating -------------------------------------
                json     34.263k (± 0.8%) i/s   (29.19 μs/i) -    688.446k in  20.094387s

ByteListDirectOutputStream + SWAR

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.246k i/100ms
Calculating -------------------------------------
                json     22.857k (± 1.2%) i/s   (43.75 μs/i) -    458.184k in  20.048535s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   134.000 i/100ms
Calculating -------------------------------------
                json      1.324k (± 1.4%) i/s  (755.18 μs/i) -     26.532k in  20.040921s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   269.000 i/100ms
Calculating -------------------------------------
                json      2.710k (± 1.4%) i/s  (368.97 μs/i) -     54.338k in  20.053211s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     3.700k i/100ms
Calculating -------------------------------------
                json     37.012k (± 1.1%) i/s   (27.02 μs/i) -    743.700k in  20.095805s

SegmentedByteListDirectOutputStream + Scalar

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     1.798k i/100ms
Calculating -------------------------------------
                json     18.096k (± 1.1%) i/s   (55.26 μs/i) -    363.196k in  20.073377s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   118.000 i/100ms
Calculating -------------------------------------
                json      1.184k (± 0.9%) i/s  (844.54 μs/i) -     23.718k in  20.032471s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   197.000 i/100ms
Calculating -------------------------------------
                json      1.980k (± 1.0%) i/s  (505.07 μs/i) -     39.597k in  20.001127s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     3.070k i/100ms
Calculating -------------------------------------
                json     30.296k (± 0.9%) i/s   (33.01 μs/i) -    607.860k in  20.065977s

ByteListDirectOuptutStream + Scalar

== Encoding activitypub.json (52595 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   937.000 i/100ms
Calculating -------------------------------------
                json      9.393k (± 1.2%) i/s  (106.46 μs/i) -    188.337k in  20.052987s

== Encoding citm_catalog.json (500298 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json    96.000 i/100ms
Calculating -------------------------------------
                json    955.553 (± 0.9%) i/s    (1.05 ms/i) -     19.200k in  20.095038s

== Encoding twitter.json (466906 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json   182.000 i/100ms
Calculating -------------------------------------
                json      1.824k (± 1.3%) i/s  (548.28 μs/i) -     36.582k in  20.060380s

== Encoding ohai.json (20147 bytes)
jruby 9.4.12.0 (3.1.4) 2025-02-11 f4ab75096a OpenJDK 64-Bit Server VM 21.0.8+9-LTS on 21.0.8+9-LTS +jit [arm64-darwin]
Warming up --------------------------------------
                json     2.721k i/100ms
Calculating -------------------------------------
                json     26.785k (± 1.0%) i/s   (37.33 μs/i) -    536.037k in  20.014396s

headius · 2025-08-19T18:54:39Z

@samyron Great results! I think we could go ahead with this any time, pending my couple of minor review comments that should be addressed. The segmented stream is consistently faster than the old logic, and coupled with SWAR it can be much faster. I'd like to see this land so we can get back to playing with the vector API.

samyron · 2025-08-22T12:22:29Z

@samyron Great results! I think we could go ahead with this any time, pending my couple of minor review comments that should be addressed. The segmented stream is consistently faster than the old logic, and coupled with SWAR it can be much faster. I'd like to see this land so we can get back to playing with the vector API.

@headius I'm happy to address the comments but I don't see any review comments on this PR...

headius

Only minor changes needed

headius · 2025-08-13T19:43:36Z


    protected final byte[] escapeTable;

+    private static final String USE_SWAR_BASIC_ENCODER_PROP = "json.useSWARBasicEncoder";


Let's prefix this with jruby. like other properties in JRuby and other libs.

headius · 2025-08-14T03:18:34Z

+    private static final int DEFAULT_CAPACITY = 1024;
+
+    private int totalLength;
+    private byte[][] segments = new byte[21][];


Makes sense. 👏

Maybe a comment or well-named constant so nobody else asks that question in the future?

headius · 2025-08-22T19:53:33Z

@samyron D'oh, I had started a review but never submitted it. Just a couple of minor changes and we can merge.

headius · 2025-08-27T19:45:50Z

Ship it!

…er implementation.

…SegmentedByteListDirectOutputStream.

…ing the output buffer.

…byte in that chunk that needs escaping.

samyron mentioned this pull request Aug 13, 2025

Use Vector API in the Java Extension #824

Merged

samyron commented Aug 13, 2025

View reviewed changes

Comment thread java/src/json/ext/LinkedSegmentedByteListDirectOutputStream.java Outdated

byroot requested a review from headius August 13, 2025 14:03

samyron commented Aug 13, 2025

View reviewed changes

tompng reviewed Aug 15, 2025

View reviewed changes

headius requested changes Aug 22, 2025

View reviewed changes

byroot requested a review from headius August 27, 2025 18:44

headius approved these changes Aug 27, 2025

View reviewed changes

samyron added 8 commits August 28, 2025 20:03

Allow for segmented output streams and a SWAR-based basic StringEncod…

5274d5d

…er implementation.

Handle the case were the capacity overflows Integer.MAX_VALUE.

b2644ff

Remove the LinkedSegmentedByteListDirectOutputStream in favor of the …

32f3287

…SegmentedByteListDirectOutputStream.

Use a ternary to determine the capacity of the next segment when grow…

44b1d87

…ing the output buffer.

Use SWAR if there is still at least 4 bytes remaining.

a458201

Ensure the SWAR encoder in the java extension checks every byte.

9ebe105

Refactor the SWAR logic into a separate subclass of StringEncoder.

052198a

Refactor the logic to evaluate every byte in the chunk if there is a …

43a8a83

…byte in that chunk that needs escaping.

byroot force-pushed the sm/use-segmented-outputstream-and-swar branch from 182105a to 43a8a83 Compare August 28, 2025 18:03

byroot merged commit ae83838 into ruby:master Aug 28, 2025
35 checks passed

byroot mentioned this pull request Aug 28, 2025

Improve buffer abstraction's encoding handling in JRuby dumper #760

Closed

3 tasks

byroot mentioned this pull request Sep 18, 2025

IndexOutOfBoundsException during SWAR string encoding #859

Closed


		protected final byte[] escapeTable;

		private static final String USE_SWAR_BASIC_ENCODER_PROP = "json.useSWARBasicEncoder";

Conversation

samyron commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changelog 📓

Benchmarks

SegmentedByteListDirectOutputStream + SWAR

ByteListDirectOutputStream + SWAR

ByteListDirectOutputStream + Scalar

SegmentedByteListDirectOutputStream + Scalar

master (as of commit 37e6890)

Uh oh!

Uh oh!

samyron Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

headius Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

samyron commented Aug 14, 2025

SegmetedByteListDirectOutputStream + SWAR

ByteListDirectOutputStream + Scalar (effectively the same code as master)

SegmentedByteListDirectOutputStream + Scalar

ByteListDirectOutputStream + SWAR

Master

Uh oh!

tompng Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

samyron Aug 15, 2025

Choose a reason for hiding this comment

Uh oh!

samyron commented Aug 18, 2025

Benchmarks as of this commit

SWAR + SegmentedByteListDirectOutputStream

SWAR + ByteListDirectOutputStream

Uh oh!

samyron commented Aug 18, 2025

Uh oh!

samyron commented Aug 19, 2025

SegmentedByteListDirectOutputStream + SWAR

ByteListDirectOutputStream + SWAR

SegmentedByteListDirectOutputStream + Scalar

ByteListDirectOuptutStream + Scalar

Uh oh!

headius commented Aug 19, 2025

Uh oh!

samyron commented Aug 22, 2025

Uh oh!

headius left a comment

Choose a reason for hiding this comment

Uh oh!

headius Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

headius Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

headius commented Aug 22, 2025

Uh oh!

headius commented Aug 27, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

samyron commented Aug 13, 2025 •

edited

Loading

master (as of commit `37e6890`)